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several  recent  papers,  and  appears  highly  useful  in  practice.  When  a 

*  parametric  transformation  family  such  as  power  transformations  is  used, 
then  the  transformation  can  be  estimated  by  maximum  likelihood.  The  MLE 
however  is  very  sensitive  to  outliers.  In  tnis  article,  we  propose 
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diagnostics  which  indicate  cases  influential  for  the  transformation  or 
regression  parameters.  We  also  propose  a  robust  bounded-influence 
estimator  similar  to  the  KrasJcer-Welsch  regression  estimate.  Both  the 

diagnostics  and  the  robust  estimator  can  be  implemented  on  standard 
software. 
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1.  INTRODUCTION 

In  regression  analysis,  the  response  y  is  often  transformed  for  two 
distinct  purposes,  to  induce  normally  distributed,  homos cedastic  errors 
and  to  improve  the  fit  to  some  simple  model  involving  explanatory 
variables  x*  In  many  situations,  however,  y  is  already  believed  to  fit  a 
known  model  f(x;]J)  &  being  a  p-dimensional  parameter  vector.  If  a 
transformation  of  y  is  still  needed  to  remove  skewness  and/or 
heteroscedastici ty ,  then  one  can  transform  both  y  and  f(x;B/  in  the  same 
manner.  Specifically,  let  y(^*  be  a  transformation  indexed  by  the 
parameter  \  and  assume  that  for  some  value  of  X 

Y(\)  =  f(^)(x;fl)  +  0-ei  (1) 
where  €^,...,€N  are  independent  and  at  least  approximately  normally 
distributed.  Notice  the  difference  between  (1)  and  the  usual  approach  of 


transforming  only  the  response,  not  f(x;£),  i.e., 

y(X)  =  f(x;B)  +  o-Gi.  (2) 
It  should  be  emphasized  that  model  (1)  is  not  a  substitute  for  (2).  Both 
models  are  appropriate,  but  under  different  circumstances.  Model  (2)  has 
been  amply  discussed  by  Box  and  Cox  (1964)  and  others,  e.g..  Draper  and 
Smith  (1980)  and  Cook  and  Weisberg  (1982).  Typically  in  model  (2), 
f(x;fl_)  is  linear  but  in  principle  nonlinear  models  can  be  used.  Model 


(1),  which  we  call  "transform  both  sides",  has  been  discussed  extensively 
in  Carroll  &  Ruppert  (1984),  and  Snee  (1985),  and  Ruppert  and  Carroll 
(1986)  and  we  will  only  summarize  those  discussions.  According  to  (1), 
f(x;(J)  has  two  closely  related  interpretations;  f(x;(J)  is  the  value  of  v 
when  the  error  is  zero  and  it  is  the  median  of  the  conditional 
distribution  of  y  given  x.  In  Carroll  and  Ruppert  (1984),  we  were 
concerned  with  situations  where  a  physical  or  biological  model  provides 


but  where  the  error  structure  is  a  priori  unknown.  Examples  by 
Snee  (1985),  Carroll  and  Ruppert  (1984),  Ruppert  and  Carroll  (1986),  and 
Bates,  Wolf,  and  Watts  (1985)  show  that  transforming  both  sides  can  be 
highly  effective  with  real  data,  both  when  a  theoretical  model  is 
available  and,  as  Snee  shows,  when  f(x;in  is  obtained  empirically. 

By  estimating  \,  o-  and  ji  simultaneously,  rather  than  simply  fitting 
the  original  response  y  to  f(x;(S),  we  achieve  two  purposes.  Firstly,  JJ 
is  estimated  efficiently  and  therefore  we  obtain  an  efficient  estimate  of 
the  conditional  median  of  y.  Secondly,  we  model  the  entire  conditional 
distribution  of  y  given  x,  and,  in  particular,  we  have  a  model  wich  can 
account  for  the  skewness  and  heteroscedastici ty  in  the  data.  Carroll  and 
Ruppert  (1984)  discuss  the  importance  of  modeling  the  conditional 
distribution  of  y  in  a  special  case,  a  spawner-recrui t  analysis  of  the 
Atlantic  menhaden  population.  To  specify  the  conditional  distribution  of 
y,  for  fixed  X  let  h(y,\)  be  the  inverse  of  y(X),  i.e.  h(y(X>,X)  =  y. 


and  let  F  be  the  distribution  function  of  We  assume  that  F  is 

approximately  normal,  but  not  necessarily  exactly  normal  since  if  y^X^ 
is  the  Box-Cox  (1964)  modified  power  family, 

y(X)  =  (yX-l)/x  X^O 

(3) 

=  log(y)  X=0, 

then  F  must  have  finite  support  whenever  \/0 .  The  p-th  quantile  of  y 


given  x  is 


h([f(X)(x;jJ)  +  o-F  *  ( p )  ] ,  X ) 


(4) 


and  the  condition.!'  in  ■  i  of  y  is 

E(ylx)  *  H  f  (  X)  (x;fl)  +  0-6}  ,  X)dF(6)  ,  (5) 

where  -a£x<a  is  the  suoport  of  F.  Ruppert  and  Carroll  (1986)  discuss 
estimation  of  (4)  and  (5).  E(yjx)  is  easily  estimated  by  Duan's  (1983) 


6 


"smearing"  estimate,  which  estimates  F  by  the  empirical  distribution  of 
the  residuals;  see  section  5. 

Many  data  sets  we  examined  have  substantial  outliers  in  the 

ts 

untransformed  response  y,  but  not  in  the  residuals  y (  -f (  (x; 8 ) ;  the 

.  transformation  has  accommodated,  or  explained,  the  outlying  y's.  There 
is  still  the  danger,  however,  that  a  few  outliers  in  y  can  greatly  affect 
x  and  JJ .  Outliers  should  not  be  automatically  deleted  or  downweighted, 
especially  when  they  appear  to  be  part  of  the  normal  variation  in  the 
response,  but  it  should  be  standard  practice  to  detect  and  scrutinize 
influential  cases  and  when  outliers  are  present  to  compare  the  MLE  with  a 
robust  estimtor.  In  this  paper  we  propose  a  diagnostic  and  a  "bounded- 
influence"  estimator  which  can  be  used  together  for  detecting  influential 
cases  and  for  robustly  estimating  X  and  JJ. 

Case  deletion  diagnostics  for  linear  regression  are  discussed  in 
Belsley,  Kuh,  and  Welsch  (1980)  and  Cook  and  Weis  berg  (1982),  and  have 
been  extended  to  the  response  transformation  model  (2)  by  Cook  and  Wang 
(1983)  and  Atkinson  (1986).  The  last  two  papers  approximate  the  change 

A 

in  \  as  single  cases  or  subsets  of  cases  are  deleted.  Subset  deletion 
can  be  unwieldly  because  of  the  large  number  of  possible  subsets.  If 
influential  subsets  are  to  be  detected,  one  needs  some  strategy  to 
searching  for  them.  Alternatively,  one  can  examine  weights  from  a  robust 
estimator  with  good  breakdown  properties. 

Bounded-influence  regression  estimators,  so-called  because  they 
place  a  bound  on  the  influence  of.  each  observation,  have  been  proposed  by 

» 

Krasker  (1980),  Hampel  (1978),  and  Krasker  and  Welsch  (1982),  and  this 
»  last  paper  provides  a  good  overview.  Carroll  and  Ruppert  (198?)  proposed 
a  bounded  influence  transformation  (BIT)  estimator  extending  the  Krasker- 
Welsch  estimator  to  the  response  transformation  model  (2). 
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In  this  paper  we  adapt  Atkinson's  (1986)  diagnostics  and  the  BIT 
estimator  to  the  "transform  both  sides"  model.  The  basic  technique  is  to 
linearize  the  model  (1)  by  a  Taylor  approximation  at  the  MLE,  and  then  to 
apply  ordinary  regression  diagnostics  and  bounded-influence  estimates. 

Our  methods  are  designed  to  be  easily  implemented  on  standard 
software.  All  our  computations  were  performed  on  the  SAS  package  using 
PROC  NL IN  and  rather  simple  data  manipulations  in  PROC  MATRIX  and  DATA 
steps.  The  computations  would  also  be  straightforward  on  other  software 
packages.  Our  computational  techniques  can  be  applied  to  a  bounded- 
influence  estimate  for  the  response- transformation  model  (2),  thus 
eliminating  the  need  for  a  lengthy  FORTRAN  program  used  in  Carroll  and 
Ruppert  (1985). 


MAXIMUM  LIKELIHOOD  ESTIMATION 


Throuqhout  this  paper  y^X^  is  the  modified  power  transformation 
(3).  Under  model  (1)  the  log-likelinood  is 

_N_ 

L(J3,\rO-)  =  X  log  f(JJ,X/®-) 
i  =  l 

where 

loq  f  (fl,  X,o-)  =  -ilog(27To-2)  f  (X-l)log(y.) 

-l/(2cv2)  [y.(  X)-f (  X)  (x.jfl)  ]2.  (6) 

For  fixed  JJ  and  X 

o-2(S,X)  =  N'1  X  [y<X)-f (X)  (x^S)  )2 
i=l 

maximizes  L(j3,X»o-)  in  o- .  Thus,  the  MLE  of  G[  and  X  maximizes 
Lmax(-fl-'X)  =  L(0rX,o-2(0,X)  ) 

JL 

=  -(N/2)log{N_1 X  [  (yjX)-f(X) (xi;B) )/tX_1]2}  +  constant  (7) 

i=l 

where  £  is  the  geometric  mean  of  y.,...,y„.  Therefore,  £  and  X  minimize 


i 

i 


Following  Box  and  Cox  (1964),  £  and  X  can  be  computed  as  follows. 

For  fixed  minimize  (8)  in  13  by  ordinary  (typically  nonlinear)  least- 

squares  and  call  the  minimizer  fl(\).  Plot  L  ( 8  ( X )  ,  X )  on  a  grid  and 

maximize  graphically  or  numerically.  This  technique  is  particularly 

T 

attractive  when  f  is  not  transformed  and  f(x;j3)  =  x  J3  for  then  (8)  can  be 
minimized  in  &  by  linear  least-squares.  When  transforming  both  sides, 
the  technique  is  less  attractive  computationally  but  it  does  give  the 
confidence  interval 


U:  Lmax(^-{X)'M  -  Lmax(^(X) 'X3~*X1(1"“)}' 


where  X^(l-®>  is  the  (l-<x)  quantile  of  the  chi-square  distribution  with 
one  degree  of  freedom.  Minimizing  (8)  simultaneously  in  X  and  is 
straightforward  with  standard  nonlinear  regression  software.  One  simply 
fits  the  dummy  variable  =0  to  the  pseudo-model 


=  [y^X)-f(X) (x. ffi)]/*^1 


with  regression  parameter  (13, X).  Not  only  is  the  least-squares  estimate 
of  (j3,X)  the  MLE,  but  for  small  values  of  o-  and  large  N  estimating  the 
covariance  matrix  of  J3  using  the  pseudo  regression  model  (10)  is 
essentially  equivalent  to  inverting  the  Fisher  information  for  (X,3,o-), 
see  the  appendix. 


3.  DIAGNOSTICS 


Let  (  X,£)  and  (\i)'JL(i)>  be  the  MLE’s  with  and  without  case  i, 
respectively.  The  changes  A\i  =  (\-\(i))  and  A^  =  (1“^)  are  easily 
interpreted  measures  of  influence,  called  the  sample  influence  curve 
(Cook  and  Weisberg  (1982)).  Unlike  in  linear  regression,  and  A&^ 

cannot  be  computed  exactly  without  actually  recomputing  the  MLE  with  case 
i  deleted.  However,  AX^  and  Afl  ^  can  be  approximated  by  applying 
Atkinson's  (1986)  "quick  estimate"  to  a  linearization  of  model  (1). 

To  approximate  AX(^j  and  A£^  ^  we  linearize  the  model 


AVA1  =  f(X,(x)8)/yX'1  *  error 
about  \,  a.  Let 


(11) 


and 


z(X,£)  =  [y(X)-f(X)  (x;fl)]/yK_1, 
w(  \,i)  =  (  i/i  X)  z  (Xr6)  , 

u^XrB)  =  (J/Jfli)z(X#8)  =  -( j/ja.)f(X)  (x;6)/^X'1, 


u(X,£)  =  (u- { X, £),...  ,u(  XfS) )  . 

r 


Sometimes  we  will  write  z(y,x;£,X)  instead  of  z(£,X)  to  emphasize  the 
dependence  on  y  and  x.  The  same  holds  for  w(X»£)  and  u(X»£).  Also  let 
z  =  z(Xf£)»  w  =  w(X,£),  and  u  =  u(X,B) .  Then  (11)  is  approximated  by 

z  =  -  (  X~  X ) w  -  ( £-£) Tu  +  error.  (12) 
If  we  fit  equation  (12)  to  the  full  data,  then  of  course  X  =  X  and 
£  =  £.  If  instead  we  fit.  (12)  with  the  ith  case  deleted,  then  we  obtain 
Atkinson's  (1986)  "quick  estimate"  approximation,  which  we  call  X^  j  and 
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Let  AX^  =  (X-X?i})  and  AS^  =  })  be  the  resulting 

A  ^ 

approximations  to  AX^  and  AS^ «  Because  (12)  is  linear,  refitting 
without  case  i  is  easy  using  standard  matrix  identities,  which  have  been 
programmed  in  many  statistical  packages. 

We  used  the  following  computational  scheme  on  SAS.  First  (8)  was 

A  A 

minimized  by  PROC  NLIN  to  obtain  (X*B.),  then  z,  w,  and  u  were  generated 

in  a  DATA  step,  and  finally  the  linear  model  (12)  was  fit  on  PROC  REG. 

PROC  REG  calculates  the  regression  diagnostic  DFBETAS^  (Belsley,  Kuh,  and 

~  0  “  0 

Welsch  1980),  which  is  a  scaled  version  of  (AX;  ,  Afl  •  )  .  The  unsealed 
(AX^Aii0)  DFBETA^  in  the  Belsley,  Kuh,  Welsch  nomenclature  and  is 
not  part  of  standard  SAS  output.  If  we  are  interested  in  cases  with 
relatively  large  values  of  AX^  then  the  scaling  is  immaterial. 

Atkinson's  (1986)  equation  (19)  gives  a  simple  formula  for 
calculating  AX^  alone.  We  feel,  however,  that  influence  for  both  X  and 
(3_  should  probably  be  assessed  together  and  DFBETAS^  is  ideal  for  this. 

When  only  the  response  is  transformed,  (3  depends  heavily  on  X  and  it  is 
sensible  to  estimate  X  first  and  then  to  estimate  13.  When  transforming 
both  sides,  &_  is  usually  very  stable  as  X  is  perturbed  and  X  and  can 
be  treated  simultaneously. 

In  the  example  in  section  5  and  in  other  examples  that  we  will  not 
report,  AX^  and  AX^^  were  often  considerably  different,  which  is 

surprising  since  the  quick  estimate  is  reasonably  accurate  when  y  alone 
is  transformed  (Atkinson  1986).  The  difference  is  that  here  u  depends  on 
X  and  fi_.  The  approximation  AX^  does  indicate  cases  with  relatively 
large  values  of  AX^»  and  AX  A  seems  adequate  for  diagnostic  purposes. 

To  obtain  a  single  measure  of  joint  influence  for  (\,&_)  one  can 

compute  Cook's  D  or  DFFITS  (Belsley,  Kuh,  and  Welsch  (1980))  for  the 

psuedo  model  (12). 


4.  ROBUST  ESTIMATION 


A  general  approach  to  robust  estimation  is  to  minimize  asymptotic 
variance  subject  to  a  bound  on  the  gross-error  sensitivity.  This 
approach  was  begun  by  Hampel  (1968,  1974),  applied  to  regression  by 

Hampel  (1978),  Krasker  (1980)  and  Krasker  and  Welsch  (1982),  and  used  in 
the  response  transformation  problem  by  Carroll  and  Ruppert  (1985). 

Here  we  will  find  an  estimator  bounding  the  influence  for  the 
parameters  \  and  £.  We  will  ignore  o- ,  which  can  be  estimated  separately 
with  a  robust  scale  functional,  e.g.  the  MAD,  applied  to  the  residuals. 
Let  £(x,y;X,fl)  be  the  score  function 

f(x,y;X,£>  =  *<  )zz<\,8)  (13) 

<)/<}£ 

w(  X,£) 

=  z  ( X,£ ) (  )  • 

u  ( X  ,£) 

Since  tne  MLE  minimizes  (8)  it  solves 
N 

/  f  (2£i  fYj_ >  K»£)  =  0, 

£=1 

at  least  when  \  and  £  are  unconstrained  and  f(x;£)  is  a  smooth  function 

A  A 

of  The  MLE  is  highly  sensitive  to  cases  with  large  values  of  z(X»£)» 

A  A  A  A 

w(X»£),  or  u(\,£)  corresponding  to  response  outliers,  high  leverage 
;>unts  for  \,  and  high  leverage  points  for  £,  respectively. 

A  robust  bounded-influence  estimator  (X,£)  is  found  by  solving 
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N 

)  W(yi,xi;\,fl)f-(yi,xi;K,6)  =  0 

i~=l 

where  W  is  a  scalar  weight  function  such  that  W t  is  bounded.  The  optimal 
choice  of  W  was  first  studied  by  Hampel  (1968,  1974)  for  qeneral 
univariate  parametric  families.  When  choosing  W,  asymptotic  efficiency 
measured  by  the  covariance  of  (X»fl.)  must  be  balanced  against  robustness 
measured  by  the  norm  of  Wf-  For  a  multivariate  parameter  such  as  (X#0.)» 
this  balancing  raises  philosophical  questions  since  there  are  many  ways 
of  comparing  covariance  matrices  or  of  norming  vector  functions.  The 
approach  we  take  generalizes  the  Krasker-Welsch  (1982)  bounded-influence 
regression  estimates.  Whether  the  Krasker-Welsch  estimator  is  optimal  in 
any  meaningful  sense  is  an  open  question  (Ruppert  1985),  but  it  seems 
quite  satisfactory  in  practice. 

Let  ?  be  the  gradient  of  log  f(JS,\,o-)  with  respect  to  (J3,\).  For 
any  weighting  function  W,  the  influence  function  evaluated  at  (y^»x^) 
will  be  defined  as 

IF(y,x;\,S)  =  B_1W(y  ,x; \,J})£(y  ,x;  X,B  ) 

where 


N 

B  =  /  Ey{W(y,xi,XfB)f  (y,x.;X,i)itT(yrXi;Xfi)  >• 

f=l 

This  definition  of  IF  coincides  wiith  the  usual  definition  when  the  x's 
are  i.i.d.  for  some  H,  and  the  averaging  over  (x^,...,xN>  in  the 
definition  of  B  is  replaced  by  expectation  with  respect  to  H.  Our 
definition  is  appropriate  for  fixed  or  random  x's.  In  the  definition  of 
B  on  page  5  of  Carroll  and  Ruppert  (1985),  W  is  incorrectly  squared.  In 
that  article,  but  not  here,  t  =  1.  The  asymptotic  covariance  matrix  of 
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(X,£)  is  V  =  B_1A(b"1)T  where 
_N_ 

A  »  N_1  )  Ey^w2(y,xiAf6.)^(y#xi;X»i>tT(y,xi;X,B) }. 
i~l 

An  intuitively  reasonable  way  to  norm  W  t  is  to  use  the  asymptotic 
covariance;  see  Krasker  and  Welsch  (1982)  for  further  motivation  and 
discussion.  The  resultant  measure  of  influence,  the  so-called  self- 
standardized  gross-error  sensitivity  is 

Y2  =  max  [IF(y. ,xi;X/£)Tv'1IF((yi,xi,X,£)3 

=  max  {[t(yi,xi,X,a)TA“1f-(yi,x.,X,£)  ]iW2(yi,xi?X,£)  }. 

i 

Note  that  W  has  been  incorrectly  omitted  from  the  last  term  in  equation 
(15)  of  Carroll  and  Ruppert  (1985).  / 2  must  be  at  least  (p+1)*.  From 

experience  with  other  problems  we  suggest  bounding  Y 2  by  a(p+l)^,  where 
"a"  is  between  1.1  and  1.5,  and  a  =  1.2  or  1.3  has  generally  been 
satisfactory.  To  bound  Y2  bY  a(p+l)^,  we  use  the  weighting  function 

W(-  ,x;Xr£)  -  min{l,a(p+l)^[f-(y,x;X,£)TA_1f-(y,x;x,i8)  ]  *}. 


Here  A  is  defined  implicitly  since  it  depends  upon  W  and  vice  versa.  In 
practice,  X,  £,  and  A  are  estimated  iteratively. 

We  used  a  simple  iterative  scheme: 

(1)  Fix  a>l.  Let  C  be  the  total  number  of  cycles.  Set  c*l.  Let  X  and 

r 

be  pro!  iminary  estimates,  probably  the  MLF.s.  Set  W^  *1. 

(2)  Define 


15 


N 


»  -  N"1^"  «?Kyi,xi,xp,ip)tT(yi,xi,\p.ip). 

i=l 

In  t  use  the  weighted  geometric  mean  £  =  exp(^  w^logy^/y  ) . 


(3)  Update  the  weights: 

W.  =  min{l,  a(p+l)*[t(y.  ,x.  ,fl )TA-1f-(y.  ,x.  ;X  ,B  >  ]"*  }. 

i  i  —l  p  — p  i  —l  p  — p 

(4)  Solve 

_N_ 

/  Wit’(yi '~i •  =  ° 

i=i 

(5)  If  c<C,  set  X  =  X  and  fl  =  B,  c=c+l  and  return  to  (2).  If  c-C  then 

P  ~P  - 

stop. 

In  the  examples  discussed  in  section  5  and  other  examples  that  we 

A  A 

will  not  report,  X  and  £  stabilized  at  C=2.  Therefore,  we  recommend 
C=2,  or  Derhaps  C=3  for  small  N  or  data  sets  with  extremely  influential 
points.  In  fact  C=1  seems  adequate,  at  least  for  diagnostic  purposes. 
We  calculated  step  (3)  with  a  short  program  in  PROC  MATRIX  and  step  (4) 
was  performed  in  PROC  NLIN.  PROC  MATRIX  is  needed  only  to  invert  A,  and 
the  program  should  be  easily  modified  when  PROC  MATRIX  is  replaced  by  an 
interactive  matrix  language.  Undoubtedly,  the  computations  would  also  be 
easy  on  other  packages.  We  will  call  the  final  estimate  the  BITBS. 

This  iterative  method  can  also  be  used  for  the  response 
transformation  model.  Instead  of  using  (3)  one  sets 

?(x,y;X,B>  =  <  ){[yU'-f(x;6)]/r  )  . 

J/JB 

It  should  be  noted  that  this  approach  differs  from  the  BIT  estimate  of 
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Carroll  and  Ruppert  (1985),  since  the  BIT  estimates  o-  simultaneously. 
Because  the  likelihood  scores  for  6  and  o-  are,  respectively,  linear  and 
quadratic  in  the  residual,  the  joint  bounded-influence  estimator  of  \,fi, 
and  o  behaves  as  a  rescending  "psi-function" ,  e.g.  a  Hampel  W-estimator 
(Hampel  1978)  .  Therefore,  the  influence  on  X  and  J3  of  extreme  response 
outliers  approaches  zero  when  the  influence  for  or  is  simultaneously 
bounded. 


5 .  AN  EXAMPLE 


When  managing  a  fish  stock,  one  must  model  the  relationship  between 
the  annual  spawning  stock  size  and  the  eventual  production  of  new 

catchable-sized  fish  (returns  or  recruits)  from  the  spawning.  Ricker  and 
Smith  (1975)  give  numbers  of  spawners  (S)  and  returns  (R)  from  1940  until 
1967  for  the  Skeena  River  sockeye  salmon  stock.  Using  some  simple 

assumptions  about  factors  influencing  the  survival  of  juvenile  fish, 
Ricker  (1954)  derived  the  theoretical  model 

R  =  D1  S  exp(fi2S)  =  f(S;fl)  (14) 

relating  R  and  S.  Other  models  have  been  proposed,  e.g.  by  Beverton  and 
Holt  (1957).  However,  the  Ricker  model  appears  adequate  and,  in 

particular,  gives  almost  the  same  fit  for  this  stock  as  the  Beverton-Hol t 
model . 

From  figure  1,  a  plot  of  R  against  S,  it  is  clear  that  R  is  highly 
variable  and  heteroscedastic,  with  the  variance  of  R  increasing  with  its 
mean.  Several  cases  appear  somewhat  outlying,  in  particular  #5,  #19,  and 
#25.  The  model  (14)  was  linearized  about  the  MLE  to  form  the  pseudo 

model  (12)  and  the  square  root  of  Cook's  D  was  plotted  against  case 
number;  see  figure  2.  Case  #5  and  especially  case  #12  stand  out.  In 
figure  1  case  #12  is  somewhat  masked  by  the  heteroscedast ici ty  since  the 
residual  on  the  original  scale  *si2'— * ^  *s  re3-atively  small,  but 

after  transformation  by  the  MLE  X  =  .314  the  residual  [R^2  f  ^  S  ;  S^)  ] 
is  substantially  larger  though  still  not  excessive.  Case  #12  is  an 
extremely  high  leverage  point.,  and  its  Hat  matrix  diagonal  according  to 
model  (12)  is  h^  2  =  0.685.  An  h  value  exceeding  2p/N  =  6/28  =  .214  is 
considered  high  by  Hoaglin  and  Welsch  (1978).  Since  h^  =  .23,  case  #5  is 
also  a  leverage  point  by  this  criterion.  In  figure  3,  the  residuals 
{  R(  ^  )  -  t  (  ^  '  ( S  .  ,  fl  )  ]  =  are  dotted  against  S  an  .  though  #12  stands  out. 


it  does  not  seem  extremely  outlying  until  one  accounts  for  leverage.  To 

compensate  for  leverage,  Belsley,  Kuh,  and  Welsch  (1980)  suggest 

standardizing  by  its  estimated  standard  error  to  produce 

RSTUDENT .  =  § . /S ( i ) ( 1-h . ) * , 

11  l 

where  S(i)  is  the  root  mean  square  error  without  case  i.  For  this  data 
set  RSTUDENT12  =  -4.40! 

In  table  1  we  present  influence  diagnostics  applied  to  model  (12), 

the  exact  change  AX^,  the  quick  estimate  and  another 

^  N  ~  o 

approximation  AX^  .  It  is  evident  that  AX^  is  not  always  close  to 

AXi  r  and  there  are  at  least  two  possible  causes  of  inaccuracy:  (i) 

AX^  uses  a  linearization  of  the  parameters  and  (ii)  in  model  (12)  ^  is 

held  fixed  rather  than  readjusted  as  cases  are  deleted.  To  isolate  the 

effects  of  cause  (ii),  we  experimented  with  a  different  approximation. 

*  N 

We  computed  the  nonlinear  least  squares  estimate  X(^j  minimizing  (8) 

when  the  y  is  calculated  from  the  full  data  but  the  sum  of  squares  in  (8) 

N  “  N  *  N 

is  over  j^i.  Then  we  set  AX^  =  *X-X(i)  )•  Of  course,  AXj  is  as 

a 

difficult  to  compute  as  AX j  itself  and  is  not  of  interest  as  a  practical 

N  *  Q 

approximation;  we  have  calculated  AX^  just  to  learn  why  AXj  is 
Inaccurate.  In  table  1,  |A>XjJ  is  large  for  i=5  and  12.  In  both  cases, 
AXaN  approximates  /\\^  much  better  than  AXj_^  which  suggests  that  cause 
(i)  is  the  primary  problem.  It  is  clear,  though,  that  small  changes  in  £ 
from  deleting  an  outlying  y^  can  have  a  notable  effect  on  X.  We  have 
compared  AX^  and  AX^  for  several  other  data  sets,  the  kinetics  data  of 

Carr  (1960)  analyzed  in  Box  and  Hill  (1974)  and  Carroll  and  Ruppert 

(1984),  the  Atlantic  menhaden  spawner-recrui t  data  in  Carroll  and  Ruppert 
1995),  and  the  "population  A"  soawner-recruit  data  in  Ruppert  and 
Carroll  (  1986).  In  all  cases  A^i^»  though  not  an  accurate 
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approximation,  did  indicate  large  values  of  i  AX^  I  »  an<3  AX^  appears  to 
be  an  effective  diagnostic  of  high  influence. 

We  computed  the  BITBS  estimate  with  bound  a-1.2  and  C=3  cycles.  In 
table  2,  \,  an<*  non-unit  are  given  for  the  MLE  and  each 

iteration  of  BITBS.  The  changes  in  the  estimates  and  weights  from  the 
first  to  the  second  iteration  are  small,  and  one  iteration  seems  adequate 
for  most  practical  purposes;  certainly  two  iterations  suffice.  Although 
the  BITBS  estimate  severely  downweights  case  #12,  the  estimate  of  X  only 
changes  from  .31  to  .13  in  two  iterations  of  BITBS,  while  the  MLE  becomes 
X  =  -.2  if  #12  is  deleted.  For  these  data,  the  BITBS  detects  the 

influential  points  and  reduces,  but  does  not  eliminate,  their  influence. 
Case  #12  was  the  year  1951  when  a  rock  slide  severely  reduced  recruitment 
(Ricker  and  Smith  1975).  To  model  recruitment  under  normal  conditions 
one  would  delete  case  #12  and  refit.  Without  #12,  the  MLE  and  the  BITBS 
estimate  are  similar  (table  3),  and  one  would  probably  use  the  MLE.  The 
bulk  of  the  data  indicate  that  recruitment  is  highly  heteroscedast ic  and 
a  severe  trans format  ion  (X  =  -.2)  is  needed  to  induce  homoscedastic 
errors.  Because  the  anomalous  #12  occurs  where  S  is  small,  it  indicates 
less  heteroscedast  ici  ty  and  a  more  moderate  transformation  (X  =  .3)  is 
used  if  #12  is  not  deleted.  Since  case  #12  is  not  from  the  target 

populati  n  of  normal  spawning  years,  it  seems  safe  to  delete  it. This  data 
set  is  an  example  where  one  might  consider  a  robust  estimator  that  gives 
essentially  zero  influence  to  extreme  outliers,  e.q.  a  generalization  to 
transformation  models  of  a  redescendinq-psi  M-estimator.  We  normally 
prefer  using  a  redescending  M-estimator  to  rejecting  outliers,  since  an  M- 
~tima*-o-  ha s  a  known  larae  sample  distribution.  The  effects  of  outlier 
rejection  methods  upon  the  MLE  are  not  well  understood,  even 

asymptotically.  Here  we  distinguish  between  rejecting  outliers  based  on 
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some  statistical  criterion,  say  a  hypothesis  test,  and  deleting 
observations  known  to  come  from  a  non-target  population. 

Although  the  MLE  and  the  BITBS  are  similar,  when  #12  is  deleted,  two 
cases,  #4  and  #5,  are  substantially  downweighted  by  the  BITBS.  Case  #4 
had  been  masked  when  #12  was  included,  hut  #5  was  already  influential  in 
the  presence  of  #12.  One  might  explore  further  deletions  though  without 
further  information  we  feel  that  #4  and  #5  should  be  retained  in  the 
final  analysis.  If  in  addition  to  #12,  case  #4,  case  #5,  or  both  are 
deleted,  then  the  MLE  changes  noticeably;  see  Table  4.  The  deletion  of 
#5  and  the  deletion  of  #4  affect  the  MLE  in  somewhat  opposite  directions, 
though  deleting  #4  affects  the  MLE  more  severely.  The  BITBS  downweights 
#4  somewhat  more  than  #5,  so  effects  of  downweighting  #4  and  #5  tend  to 
cancel . 

We  do  not  view  the  "transform  both  sides"  method  chiefly  as  choosing 
a  new  scale  for  analyzing  the  response,  but  rather  as  modeling  the 
conditional  distribution  of  y  on  the  original  scale.  By  "original" 
scale,  we  mean  the  scale  of  primary  scientific  interest,  usually  also  the 
scale  on  which  direct  measurements  have  been  made.  The  model 
y(  V)  _  f  (  X)  (x;fl)+€ 

leads  t-. 

v  «  v  (6 )  =  (fX(x;fl)+X6)1/>v  X  ¥  0, 

=  exp(log  f(x;S)  +  X6)  X  =  0 . 

We  are  assuming  that  6  is  approximately  normal  and  in  particular 
a oproximately  symmetric,  and  this  last  point  suggests  that  the 
conditional  median  of  y  given  x  can  be  estimated  by 
ft  ( y  y)  -  f  (  x ;  0_ )  . 

The  conditional  mean  is  easily  estimated  by  the  "smearing"  estimate  of 
Ouan  (1981).  Let  r.  by  the  i-th  residual 


r.  *  ’  Vi  Xv 
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rL  =  y. K-fK(x.,8)  =  \<y[X)-f( K) (xi,fl) ) 
Then  the  smearing  estimate  is 


E(y | x)  = 


N 


N 

-l  Y  {f  +  ri)1/>v1 

i~=l 


(15) 


with  obvious  changes  if  \  =  0. 

If  X  is  0  or  if  X”1  is  an  integer,  then  E(y|x)  can  be  estimated  by 
methods  of  Miller  (1984).  Miller's  estimators  are  particularly  simple 

when  X  is  in  the  set  {—  1  ,-i  ,0, }  ,1  > .  It  is  a  common  practice  to  round  X 

to  a  value  in  this  set,  in  which  case  Miller's  (1984)  estimate  is 

applicable.  We  do  not  necessarily  advocate  rounding  X  especially  since 
there  is  some  theoretical  evidence  against  this  practice  (Carroll  1982) , 
but  when  the  rounded  value  is  very  plausible  according  to  a  hypothesis 
test,  then  the  rounding  should  have  little  effect  on  subsequent 

A 

inference.  The  common  rationale  for  rounding  \,  to  make  the 

transformation  easily  interpretable,  is  less  compelling  when  one  views  X 

as  part  of  a  model  for  y  on  the  original  scale.  In  the  present  example, 
3  -  2 

R’  or  R  '  is  admittedly  of  little  direct  biological  and  economic 
interesr,  but  the  same  is  true  of,  say,  log(R). 

I  r  -  figure  1,  m(R|S)  and  E(R|S),  both  calculated  without  #12,  are 
Dlotted.  E(RjS)  was  also  estimated  by  fixing  X  =  0,  re-estimating  ji  and 

o- ,  and  using  Miller's  (1984)  estimate: 

-  ~  2  /-> 
f  (  x  ;  JS )  e°" 

The  smearing  estimate  and  Miller's  estimate  are  so  close  that  they  would 
be  barely  distinguishable  had  Miller's  estimate  be  included  in  figure  1. 

To  r  ’o  zhe  influence  of  case  #12  on  m(R|S)  and  E(R|S),  these 
estimates  were  calculated  both  with  and  without  case  #12.  When  #12  is 
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deleted  then  not  only  are  X  and  £  set  equal  to  the  MLE  without  #12,  but 
also  the  averaging  in  (15)  is  over  i^l2. 

The  changes  in  the  estimated  median  and  mean  caused  by  deletion  of 
case  #12  are  graphed  in  figure  4.  As  might  be  expected,  deleting  #12 
caused  the  estimated  median  and  mean  recruitment  to  increase  for  small  S, 
especially  for  S  near  300-400.  The  most  dramatic  change  when  deleting 
#12  is  a  decrease  in  estimated  median  and  mean  recruitment  for  large  S. 
This  decrease  is  largely  brought  about  by  the  decrease  in  since 

controls  the  shape  of  the  Ricker  curve. 

The  "transform  both  sides"  model  is  certainly  not  the  only  model 
that  would  be  appropriate  for  this  example.  Since  R  is  heteroscedastic 
but  not  greatly  skewed  one  should  consider  heteroscedastic  models  such  as 
R  =  f(S;fi)  +  oSa6  (16) 

or 

R  =  f(S;fl)  +  o-  fOC(S;fl)6, 

where  the  variance  of  R  is  proportional  to  a  power  of  S  or  of  f.  In 
Ruppert  and  Carroll  (1986),  the  model 

R(X)  =  f(X)(S;S)  +  o-Sa€  (17) 

was  fit  to  the  Skeena  River  data,  with  case  #12  included.  The  MLE  was 
X  =  .75  and  <x  =  .5.  However,  both  Hq:  cx  =  0  and  Hq:  X  =  1  are  accepted 
by  likelihood  ratio  tests  at  level  .10,  so  models  (14)  and  (16)  both 
appear  reasonable  for  these  data,  thouqh  a  re-analysis  without  case  #12 
would  be  of  interest.  We  plan  to  study  diagnostics  and  robust  estimation 
for  model  (17)  in  the  future. 


6.  SUMMARY 


When  a  response  y  is  thought  to  fit  a  model  f(x;jS),  but  y  is 
heteroscedastic  and/or  nonnormallv  distributed,  then  y  and  f{x;(p  can  be 
transformed  in  the  same  manner  to  induce  approximately  homoscedastic, 
normal  errors  while  retaining  the  model  f(x;JJ)  for  the  conditional  median 
of  y.  Often  outliers  in  the  original  response  are  accommodated  by 
transformation;  that  is,  the  outliers  are  seen  to  be  the  result  of  the 
skewness  or  heteroscedastici ty  in  the  untransformed  data. 

In  some  situations,  an  outlier  will  indicate  a  substantially 
different  transformation  than  that  fitting  the  bulk  of  the  data.  In  our 
example  with  28  observations,  case  #12  is  a  response  outlier  associated 
with  a  small  value  of  the  conditional  median  (and  mean)  response. 
Therefore,  case  #12  counter-indicates  the  severe  heteroscedasticity  in 
the  rest  of  the  data,  and  deleting  #12  changes  the  estimated  power 
transformation  from  X  =  . 3  to  X  =  -.2. 

Influential  cases  should  be  detected  and  scrutinized  as  a  matter  of 
standar:  good  statistical  practice.  In  some  situations,  such  as  with 
case  #12  in  our  example,  there  are  good  reasons  for  removing  an 
influential  case.  In  other  cases,  the  appropriate  treatment  of  the 
outliers  will  be  less  clear-cut. 

In  this  paper,  we  propose  an  approximation  to  the  sample  influence 
curve.  Although  the  approximation  is  not  highly  accurate  it  is  an 
effective  diagnostic  for  influence  cases.  We  also  propose  a  bounded 
'  f  luenr-v  estimator,  which  can  be  used  to  pinpoint  influencial  cases,  or 
to  accommodate  them,  or  both.  The  diagnostic  and  the  robust  estimator 
can  botn  be  computed  with  standard  software. 


APPENDIX 


There  are  at  least  three  methods  of  estimating  the  covariance  matrix 

of  (J3,X):  (i)  Evaluate  the  Hessian  of  at  ((3,Xr<>).  This  is 

the  observed  Fisher  information  matrix,  I.  Use  the  (p+l)X(p+l)  upper- 

-1  * 

left  submatrix  of  I  .  (ii)  Let  I  be  the  Hessian  of  -L  (8,\) 

max  — 

~ 

evaluated  at  ( \ )  .  Use  (I  )  .  (iii)  Use  the  estimate  from  model  (10) 
treated  as  a  nonlinear  rearession  problem.  Also  there  is  a  fourth  method 
which  only  estimates  the  covariance  matrix  of  13.  Suppose  we  followed  the 
Box-Cox  method  of  estimation  described  in  section  2.  Then  we  have 

obtained  X  and  we  have  estimated  J3  by  (3(\),  which  is  the  least  squares 

estimate  when  y^^  is  fit  to  f^^(x;fl_)  with  \  fixed  at  \.  This  fit  also 

gives  an  estimated  covariance  matrix  for  J3,  which  we  will  call  the  method 
(iv)  estimate. 

Methods  (iii)  and  (iv)  are  the  easiest  to  use  since  they  can  be 
implemented  on  standard  nonlinear  regression  software.  Method  (i)  is 
justified  by  the  well-known  large  sample  theory  of  maximum  likelihood 
estimation.  Method  (ii)  ignores  o-  and  treats  (fi,X)  as  a  likelihood 

for  ( j3 ,  X  )  . 

In  theorem  A.  2  below,  we  show  that  methods  (i)  and  (ii)  are 

identical,  not  lust  for  the  transformation  problem  under  study  but  in 
qer.eral  for  parametric  estimation  where  a  parameter  is  eliminated  by 
max  imi  z  »  r.q  the  likelihood  over  that  parameter. 

In  general,  methods  (i)  and  (ii)  are  not  equivalent  to  (iii)  even  as 
N  co,  but  by  theorem  A.  1  below  all  three  estimates  of  var(j3)  are  the 
same  in  rne  limit  as  N  -»  oo  and  o  -»  0 .  Bickel  and  Doksum  (1981)  and 
’ a  r  r el  1  .v.  !  a  ippert  (1984)  have  let  N  -»  ao  and  o  ->  0  simultaneously  to 
erovido  a  simole  asymptotic  theory  for  transformations,  since  the  usual 
h  ->  co  and  o  fixed  theory  is  complicated.  "Small  o"  asymptotics  have 


often  proved  to  be  good  approximations  to  finite  sample  results  when 
checked  against  Monte  Carlo.  Moreover#  in  many  data  sets,  especially 
from  engineering  and  the  physical  sciences,  o-  does  seem  small  in  the 
sense  that  the  model  fits  the  data  very  well. 

In  Carroll  and  Ruppert  (1984)  we  show  that  methods  (i)  and  dv)  are 
equivalent  estimates  of  var(JJ)  as  N  -»  oo  and  o-  -»  0. 

In  summary,  methods  (i)  and  (ii)  are  identical,  except  of  course 
that  method  (ii)  does  not  estimate  var(o-)  or  the  covariance  of  o-  with 
and  All  four  methods  give  asymptotically  equivalent  estimates  of 

var(ji)  as  N  -»  oo  and  o-  0.  It  does  not  appear  that  method  (iii) 

A  A  A 

correctly  estimates  var(\)  or  cov(X,B.).  The  confidence  interval  (9)  for 
X  can  be  used  in  place  of  a  standard  error  for  X,  but  a  ^-method 
standard  error  of  E(y|x)  or  m(yjx)  will  require  an  estimate  of  cov(X,£) 
and  var ( X ) . 

Proaramminq  method  (ii)  by  computing  the  analytic  second  derivative 

matrix  of  L  is  somewhat  a  bother,  but  the  gradient  of  L  is  easily 
max  max  1 

programmed  and  can  be  differentiated  numerically.  Since 
L„  (fi,X)  =  -(N/2)  logcF2  (fl,  X) 

where 


N 

o-"(£,X)  =  N  1  /  z2(yi  ,xi  ;8_,  X) 
i^l 

2  ~  ~ 

and  since  the  gradient  of  cF  at  (S,X>  is  zero,  the  Hessian  of  I,  at 

ms  x 

A  A 

( a , X )  is 


1  max  -'X) 


N 


--2  \ 


(  i  /<J  6, )  ^  (  Z  U  ) 

(  1  /}  B  )  T  (  zw) 


(  J  /j  X)  (  zu) 
ii/i X) (zw) 


where  all  quantities  on  the  right  hand  side  a evaluated  at  (S,\).  It 


6 


is  not  difficult  to  numerically  compute  the  derivatives  of  (zu)  and  (zw) 
with  respect  to  £  and 

In  table  A.l  we  compare  the  method  (ii),  {iii)  and  (iv)  standard 
errors  for  the  Skeena  River  data  with  case  #12  deleted.  The  three 

A  A 

methods  produce  similar  standard  errors  of  8^  and  62*  The  standard  error 
of  \  by  method  (iii)  seems  substantially  inflated. 

Theorem  A.  1  Suppose  that  are  i.i.d.  Then  as  N  00  and 

o-  -»  0,  methods  (i)  and  (iii)  of  estimating  the  covariance  matrix  of 
are  asymptotically  equivalent. 

Sketch  of  proof ;  Define: 

N 

\  -  ip  ~ 

/  (yi,xi;8,X) 

i=l 

A 

\  A  A  A  A 

/  u(yi  ;8, \)w(yi,xi  ;8,  X) 

i"=l 

N 

\  2  '  * 

/  w  (yi,x.;fl,X) 

1  =  1 


_  1 

11  -12 

• 

'  i>  T  > 

|  —12  -22 

l—  _i 

“  ~  2 

Then  the  estimated  covariance  matrix  of  (JJ,X)  by  method  (iii)  is  s 

o 

where  s*  is  the  mean  square  error. 


1 12  = 


Now  as  N  1 


<  ca| 
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^  ^  S  Ell  ( y  ^  ,  Xx  » ,  \ )  il  ( y  ^ »  —1  * — *  ^  ^  ' 

and  by  Taylor  expansions 
(No-)  ~1^  12  -*•  0 

and 

(No-2)_1|  22  -»  D :  =  E[  (J/<)eiw(y1,xi,a, \)  I ei=0 > ei  1 2 

as  N  ->  ao  and  o-  ->  0.  (Note  that  y^  is  a  function  of  e.j  see  equation 
(1).)  Therefore,  by  method  (iii)  the  estimate  of  var  ( (N*/o-)  B  ,N*M 
converges  to 


0  D  j 

By  theorem  1  of  Carroll  and  Ruppert  (1984)  method  (i)  has  the  same 
asymptotic  estimate  of  var((N7o-)B)  but  a  different  estimate  of  var(N5\). 


Note:  Jome  type  of  regularity  conditions  on  {x^ }  are  needed  for  the 

asymptotics  to  hold.  The  assumption  that  {xi }  are  i.i.d.  is  convenient 
but  other  assumptions  could  be  used  instead.  For  a  rigorous  result,  an 
appropriate  regularity  condition  on  f  would  also  be  needed. 

J.  q 

neo r .  2 :  Let  L(0,o-)  ,  0CR  and  o€R  ,  be  a  real-valued  function. 

Let  Lfl  and  L  be  the  first  partial  derivatives  and  let  L„  ,  and 

2.  00  0o- 


B 

8 

cKW 

vv,v.*i 
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L  be  the  second  partial  derivatives  of  L. 
o-o- 

satisfies 


Then 


L(0,o-(0))  *  sup  L(0,o-)  . 
o- 


{ (  i  2/i0J0)L(0,o-(0) )  }_1 


For  each  0  suppose  o-(0) 


(A.l) 


is  the  upper  left  kXk  submatrix  of 


L0e(-'°'(e))  Leo_(0,o-(0)  ) 


L  *  (0,o-(0))  L  (0,o-(0)  ) 

tin-  —  —  n-  O  —  — 


(A. 2) 


Proof :  It  is  enough  to  prove  the  theorem  when  q=lf  for  then  the  general 

case  follows  by  induction.  By  (A.l) 


Lq.  (8,o- (0)  )  =  0, 


so  that 


0  =  (J/J0)Lo_(0,o-(0)  )  =  L0o.(0,O-(0)  )  +  Lo.o.(0,o-(0)  )  (  Jo-(0)/J0)  . 


Therefore 


Next 


d  o-  f  0 )  /  j  0  =  -Leo_(0,o-(0)  )  . 


(  J  2/<)0<)0)L(0,o-(0)  )  =  Lee+  Leo_(  Jo-/J0)T  +  (Jo-/*®)!.^ 


(A. 3) 


+  (  Jo-/J  0)  UO-/J0) 


(A. 4) 


where  all  terms  on  the  right-hand  side  of  (A. 4)  are  evaluated  at  0,o-(0). 
Substituting  (A. 3)  into  (A. 4)  we  have 
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<<)2/^ej©)L(«,o-<e)  )  =  Lfln-Ln  L*  /L 

—  —  —  —  99  9o  Go  o-o 


Using  the  identity,  (A_1+UVT)-1  =  A  -A~1UVTA~1/< 1+VTA-1U)  if  A6RkXk  and 
U,  V  €R  (see  problem  2.8,  page  33  of  Rao  1973)  we  have 


{ ( i  /J9J9)L(9,o(9) ) > 


-1 


L99  +  (L99L8oL9oL99  1 } /(Loo“L9oL99~1l9o }  * 


J99JJ6oij9olj99 


(A.  5) 


By  another  identity  (see  problem  2.7,  page  33  of  Rao  1973),  (A.  5)  is  the 
kxk  upper  left  submatrix  of  (A.  2). 
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Table  1 

Diagnostics  for  the  SJceena  River  sockeye  salmon  data. 


Diagnostics 

5 

Residual 

917 

RSTUDENT 

2.25 

Hat  diagonal 

.23 

Cook's  D 

.43 

DFFITS 

1.23 

DFBETAS-^ 

-.46 

DFBETAS-fl  2 

.55 

DFBETAS-X 

O 

• 

rH 

1 

-.31 

-.17 

AXi 

-.10 

Case  Number 


12 

19 

25 

-939 

-882 

-922 

o 

. 

■xf 

1 

-1.93 

-2.04 

.68 

.08 

.08 

8.09 

.09 

.11 

-6.49 

-.55 

-.62 

-.71 

.13 

.19 

1.38 

-.  33 

-.41 

6.06 

-.11 

-.11 

1.56 

-.04 

-.04 

.77 

-.01 

-.01 

.51 

-.03 

-.03 
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Table  2 

Maximum  likelihood  and  bounded-influence  estimates 
for  the  Skeena  River  data.  No  cases  deleted. 


C=0  (MLE) 

Ol 

o 

ii 

N> 

03 

*1 

3.295 

3.590 

3.619 

3.622 

*2 

At 

~6.9998xl0“4 

-8.307xl0"4 

-8 . 49xl0”4 

-8.50x10 

X 

.3141 

.1921 

.1329 

.1138 

W5 

1.0 

.448 

.579 

.647 

W6 

1.0 

.931 

1.0 

1.0 

W12 

1.0 

.253 

.188 

.172 

w19 

1.0 

.811 

.857 

.874 

w25 

1.0 

.733 

.776 

.790 

Table  3 


Maximum  likelihood  and  bounded-influence  estimates  for 
the  Skeena  River  data.  Case  #12  deleted. 


C=0  ( MLE ) 

C=1 

C=2 

*1 

3.78 

3.98 

3.89 

°2 

-9.54X10"4 

-10. 2xl0~4 

-9.93x10 

X 

-.199 

-.254 

-.235 

W4 

1.0 

.377 

.575 

W5 

1.0 

.448 

.753 

W6 

1.0 

1.0 

.946 

W9 

1.0 

1.0 

.954 

W12 

This  case  is  deleted 

w18 

1.0 

1.0 

.904 

W19 

1.0 

.781 

.860 

W25 

1.0 

.703 

.846 

-4 


Table  4 


Maximum  likelihood  estimation  for  the 
Skeena River  data  with  selected  cases  removed 

Cases  Removed 


#12 

#4,  #12 

#5,  #12 

#4,  #5 ,  #12 

3.78 

4.20 

3.89 

4.30 

-9.54X10"4 

-11.2X10-4 

-10.5X10-4 

-12. 1X10~4 

-.199 

-.428 

-.126 

-.392 
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Table  A.l 

Estimated  standard  errors  for  the  Skeena  River 
sockeye  salmon  data  without  case  #12 


Method 

s.e(S^) 

s.e. (82) 

s.e. ( \) 

(ii) 

0.698 

3.17X10"4 

0.369 

(iii) 

0.711 

3.  33X10"4 

0.624 

( iv) 

0.694 

3.06X10-4 

LIST  OF  FIGURES 


Fig.  1  -  Plot  of  returns  (or  recruits)  against  spawners  with  mean  and 
median  recruitment  estimated  without  case  #12.  Selected  cases  are 
identified. 

Fig.  2  -  Square  root  of  Cook's  distance  plotted  against  case  number. 

A  A 

Fig.  3  -  Residuals  =  [R^-f  (S,jl)  from  the  full-data  MLE  plotted  against 
spawners.  Selected  cases  are  identified. 

Fig.  4  -  Differences  in  mean  and  median  recruitment  estimated  without  and 
with  case  #12  plotted  against  spawners. 
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